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Abstract 



A problem frequently confronted in IRT applications is that the item parameters 
calibrated using more than two independent samples of subjects must be expressed on the 
same scale. The existing methods were developed for a pairwise transformation, that is, 
from one scale to the other. The purpose of the present study is to introduce a common 
scale transformation method which can simultaneously find a vector of transformation 
functions for placing the parameter estimates from two or more item pools on a specified 
common scale. Two examples are presented to illustrate the usefulness of the method. 
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An IRT Scale Transformation Method For 
Parameters Calibrated From Multiple Samples of Subjects 

Because of the indeterminant nature of the latent variable IRT models, the parameter 
estimates obtained from different independent calibrations may not be on the same scale. A linear 
transformation can be performed to place them on an arbitrary scale while preserving the same item 
characteristic functions. The scale transformation procedure requires that at least some of the items 
be common to the different calibrations. A number of procedures have been proposed for IRT scale 
transformation. Marco (1977) proposed using the mean and standard deviation of the ^-parameter 
estimates of the common items to determine the transformation function. Loyd and Hoover (1980) 
described a method which used the mean of the b-parameter estimates and mean of the a-parameters 
estimates. Haebara (1980) and Stocking and Lord (1983) introduced loss-minimization methods for 
computing the transformation coefficients. These methods are based on minimizing a loss function 
that reflects the errors involved in the process of transforming the estimates of parameters in one 
metric to another. Baker and Al-Kami (1991) compared a loss-minimization method (Stocking & 
Lord, 1983) with a summary statistics based method (Loyd & Hoover, 1980). They found that the 
loss-minimization method was less sensitive to atypical combinations of underlying ability, item 
difficulty, and discrimination than are the methods based on summary statistics. 

IRT parameter scale transformation is a directional process, that is, from one scale to another 
scale. Baker and Al-Kami (1991) used the terms from and to to make the direction of 
transformation explicit. The existing methods are all for a pairwise transformation, that is, there is 
one from scale and one to scale. If there is a need to compare parameter estimates from more than 
two item pools, the current practice is to find the transformation functions in a number of 
independent pairwise processes. The purpose of the present study is to introduce a common scale 
transformation method which can find a vector of transformation functions that place parameter 
estimates from two or more item pools on a common scale in one minimization process. Two 
examples are presented to illustrate the usefulness of the method. 
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Method 

According to the three-parameter logistic model, P, (0, a,-, bi, c/j, the probability of a correct 
response to item i by a person with ability level 0, is defined as: 



P = c, + ■ 



1 — c, 



l + e 



-ina.^e-by 



(i) 



where a, is usually called the item discrimination, bi the item difficulty, and c, the probability that a 
person with a very low ability gives a correct response. By the nature of this logistic model, the 
origin and unit of the ability and difficulty metric are undetermined. A linear relationship exists 
between any pair of IRT parameter scales (Lord, 1980). 

Transformation for Two Item Pools 

Consider two item pools: Pool 1 and Pool 2, which share a set of k common items. Pool 1 
and Pool 2 were calibrated using two nonequivalent random samples drawn from Population 1 and 
Population 2, respectively. Let ©y denote a vector of item parameter estimates: an, f?u and cy 
for common item i in Pool 1 , and © 2 ,- denote a vector of item parameter estimates: a 2 t - , %2i and 
C 2 i for common item i in Pool 2. Because ©y and © 2 , are on different parameter scales, P(0, 
©i,) and P(0, © 2 j) are not necessarily the same even though they are for the same item. 

Suppose © 2 j is transformed via some linear transformation to ©(i,2)« . that is on the 
parameter scale defined in calibrating Pool 1, such that 

A 

fl (l,2)i = a, 2 * - ^(1,2)« = a (1,2)^2j +B(1,2)< and C( 12)t - = c 2t - , (2) 

where A^j 2 ) and B( ] 2 ^ are the slope and intercept of the linear transformation function. After these 
transformations,©] , and ©(i,2)i are on the same parameter scale. Let 0i denote an ability level on 
the scale defined in calibrating Pool 1, then the following relationship can be established: 



m. <»„>= <»„,,„>• 





(3) 
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Suppose the abilities of the subjects in Population 1 has a known probability density function (pdf) 
g(0l). The expected proportion correct (EPC) score for item i given C0ij for the potential subjects 

from Population 1 is 



s(&J= i p W'(b „)«(*• 



and the EPC score for item i given G)(i,2)i for the same population of subjects is 






(5) 



From Equation 3 we know that Equations 4 and 5 are approximately equal, that is 

s(a»„) “«©<,,»>• <6) 



or 



jw„ )de, - ]p<e„cb„aM e < >M > 



(7) 



For one item and the identical c-parameter estimates, we can find a transformation that will make the 
left and right sides of Equations 6 and 7 exactly the same. For two or more items, there may not 
exist a transformation that will make the left and right sides of Equations 6 and 7 exactly the same 
for all items. The difference is 



oo 00 



( 8 ) 



By minimizing 8j for all the k common items, the common scale transformation coefficients A^j ^) 
and B (1 2 ) can be found. Thus, a loss function can be defined as 
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(9) 



G( A ( 1,2> B (l,2)) = 



k 

x 

1=1 




A set of transformation coefficients which minimize the loss function Q can be found by solving the 
following equation system: 



d6( A (l,2 > B (l,2)) = 0 
3A (l,2) 

d£?( A n. 2v B n.2t) _ q 
^ B (l,2) 



( 10 ) 



The partial derivatives needed in Equations System 10 are derived as follows. 

dQ( A (1.2> B n.2t) _ _ 2 y 9 dS(fi> (l2)t ) 

5A (1,2) 1=1 ' ^Ai.2) 



( 11 ) 



where 



<9S(ft) ( i 2 )i) 



-1.7(1 - c 2i )a 2i (d l - B (l 2) )e 






A 2 l,2)[l + e 



-] Ja 2 i (0 i -fl(l,2) -A 0,2)^2 i )^(1,2) j 2 






( 12 ) 



d(?( A n.2> B n.2t) _ _2 ^ 
5B (1,2) 1=1 



<9S(ft)(i 2 ), 



(13) 



where 



dS(fi> 0 ' 2V ) _ j -1-7(1 — c 2l )a 2l e 



dB, 



( 1 , 2 ) 



-1.7fl 2 i (#i - ^a2) - ^(l,2)^2i )^(i,2) 






-Ma^Ox ”%2) -^(l,2)^2r ) A (1,2) j2 



-g{0,)dd v 



(14) 



The integrals involved in the partial derivatives can be approximated numerically to any 
specified degree of precision using a Gauss quadrature formula. The pdf g can be estimated from 
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the calibration samples. If the marginal maximum likelihood estimation method(e.g., BILOG, 
Mislevy & Bock, 1991) is used to calibrate the item parameters, an empirical 0 distribution is 
estimated along the item parameters. The empirical 0 distribution consists of a set of quadrature 

points and associated weights which can be used in the loss function. 

Alternatively, any distribution that is on the 0 scale of the Pool 1 items can be used for 
computing the loss function, because Equations 3, 6 and 7 hold for any 0 value. A simple choice is 
a uniform distribution. Using a uniform 0 distribution to compute the expected proportion correct 
score is analogous to using an arbitrary set of points along the 0 scale to compute the true score. 

This procedure is used in an implementation of the Stocking & Lord method (Baker & Al-Kami, 
1991). 

The loss function defined in Equation 9 is different from the loss function in the Stocking & 
Lord method. The major difference is that Stocking & Lord’s loss function is based on the squared 
difference between the two true scores computed with the two sets of parameter estimates for the 
common items whereas the loss function introduced in the present study is based on the squared 
differences between two EPC scores summed over the individual common items. 

The loss functions of the proposed method and Stocking & Lord s method differ from that 
of Haebara's method in two aspects. First, Haebara's loss function consists of two components, 
one is based on the discrepancies resulted in transforming scale 2 to scale 1 and the other is based on 
the discrepancies resulted in transforming scale 1 to scale 2 using the inverse of the scale 2 to scale 1 
transformation function. The loss functions of the proposed and Stocking & Lord s methods do not 
contain the second component in Haebara's loss function. Second, in Haebara's loss function, the 
discrepancies between two sets of item parameter estimates are squared at various individual ability 
levels for all the individual common items before the summation over items. An investigation of the 
advantages and disadvantages for these loss functions is beyond the scope of the present paper but 
certainly deserves further study. 

Using the EPC scores has some potential advantages in developing an iterative procedure for 
finding a common scale for the item pools that involve differential item functioning (DIF) items. 
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Some authors (e.g. Kim & Cohen, 1992, Lautenschlager & Park, 1988) pointed out that the scale 
transformation may be seriously affected by the presence of DIF items. After an iteration of scale 
transformation, the EPC scores can be computed for all common items. If the EPC scores differ too 
much for a common item, then this item may be suspected as a DIF item. The next iteration of scale 
transformation can performed without the suspicious DIF item and the EPC scores are compared 
again to detect additional suspicious DIF items. The iteration procedure stops when no more 
suspicious DIF items are involved in the common item set. Detailed technical and practical issues 
will be addressed in another study. 

Transformation for Multiple Item Pools 

The method developed for scale transformation for two item pools can be extended for three 
or more item pools. First consider an example of 3 item pools. Pool 1 and Pool 2 share a set of 
items that are common to both, and Pool 2 and Pool 3 share a set of items that are common to both. 
Pool 1 and Pool 2 have a direct link because they have a common item set. Pool 1 and Pool 3 do not 
have a direct link because they do not have a common item set, but they have an indirect link because 
both pools have a direct link to Pool 2. If we want to compare items in Pool 1 with items in Pool 3, 
using pairwise transformation methods, we need to first transform parameters of items in Pool 1 to 
the scale of Pool 2 then use the Pool 2 to Pool 3 transformation function to further transform them to 
Pool 3 scale. Alternatively, Pool 2 can be transformed to Pool 1, and Pool 3 can be transformed to 
Pool 2, then to Pool 1 . In common scale transformation method, a vector of transformation 
functions that transform the parameters estimates of all the from scales to the to scale are found in 
one minimization process. 

In the general case, m (m> 2) item pools are calibrated using m independent samples of 
subjects. In order to perform a common scale transformation, we assume that every pair of the m 
item pools are linked either directly by a common set of items, or indirectly through a chain of pools 
which are directly linked pairwisely. Let us assume that the scale of Pool 1 items is referenced as a 
common scale, so the parameter estimates of the other m - 1 item pools are to be transformed to this 
scale. The selection of the common scale is arbitrary. Any of the m scales can be referenced as a 



ERIC 




common scale. Let c \ij)i denote a transformation for the parameter estimate of common item i in 
Pool j to scale 1 , that is 



where A (1 ^ and B (lj) are the slope and intercept of the linear transformation function that 

transforms the parameter estimates from scale j to scale 1 andy=2 to m. 

Let <J> denote a set of indices for all the items that are common to at least two item pools. Let 
\|/ f denote a set of indices for the pools that contain item i. For example, V|/ t = { 1,2,3 } means that 
item i is a common item shared by Pools 1, 2 and 3. After all the parameters are transformed to 
scale 1, the expected proportion correct scores for item t, i e <f), given the transformed item 
parameters ,je \|/„ will be close to each other. We can have the relationship: 



Let T denote a vector whose elements are A ^ \ ^ and B( | jy j—2 to m. The loss function Q( T) can be 



A 




(15) 



S(CQ 0J) ) = S(C5 (U) P’ Vi. ke ¥i andM. . 



(16) 



defined as 



2(T) = X 2 Z !«(»„,«,) 



(17) 



l£< 7 ke If/ J e \{/.J>k 

where 




(18) 



and g(0i) is the ability pdf estimated in the calibration of Pool 1 items. Alternatively, the g(0i) in 
Equation 18 can be replaced by a uniform distribution. Equation 17 can be minimized by setting its 
first order partial derivatives to zero, that is 



(19) 



ag(T) 

aA (W) 



= 0 and 



= 0, for y=2 to m. 

®(1 J) 



The first order partial derivatives are derived as the follows: 



ag(T) 

5a (1j) 



= 2 Z { I 

ke\f/,k>j 

ke\f/ r k<j U/ \\ J) 



^A.,7 



7) 



( 20 ) 



where 



a Ai.» - 



-1.7fl^(d| )A (1 y 



[ 1 -j- e~* Jaji * ~ B(U) ~ A(U) ^‘ ] 2 



-g(0,)rf«„ 



and 



a(2(T) / v rvr ^ \ or *• \i 

= 2 2-i 2- ,*),)]— ^ — 

('Al 



9B 



(W) 



fce \f/ ,k> j 



o»y) 






aS <fflnjv) 



ike 






pin 

° n (\ J) 



}. 



(21) 



( 22 ) 



where 



as(Gv,j 



(l.y)i’ 
<^(1,7) 



w 



0 “1 7tfy, ( - B( 1 7 ) - A ( , yjfy, ) A^y) 



— 1.7(1 — c u )a u e 

2 J l ^ Jj \d() 

[J -j. e ^ l la j^ e ^ B oj)~^J) b jA A (ij)^2 1 1 



(23) 



There are a number of well established numerical methods which can be used to minimize the 
loss function Q. The Davidon-Fletcher-Powell (DFP) algorithm was selected to minimize Equations 
17. A computer program in the C language was developed using a set of subroutines provided in 
Numerical Recipes (Press, Teukosky, Vetterling & Flannery, 1992) to perform this algorithm. This 
program is available from the author. 
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Example One: Transformation for Two Item Pools 

The data used in this example were obtained from administering two forms of a mathematics 
test, Form 1 and Form 2, to two nonequivalent groups of examinees. (Group I consisted of 1637 
examinees and Group J consisted of 1636 examinees). Both forms of the test consisted of 36 
multiple choice items drawn from two item pools, Pool 1 and Pool 2, 1 1 of which were common 
items. These data were used in the examples in Kolen and Brennan (1995) and are available from 
the authors. The ability and item parameters were calibrated using the BILOG (Mislevy & Bock, 
1991). 

The common scale transformation was performed over the 1 1 common items using the 
proposed method. A uniform distribution in an interval between -4 and 4 was used as the 0 
distribution to compute the EPC scores. This interval was the same as the one BILOG used to 
calibrate the item parameters. The item parameter estimates and the EPC scores on the original 
scales are presented in Table 1 . The c -parameters estimates are not presented because they are not 
affected by the scale transformation. It can be seen that the EPC scores for the common items in the 
two pools are quite different. For example, for item 10, the EPC score is 0.369 given the parameter 
estimates from Pool 1 and 0.412 given the parameter estimates from Pool 2. This difference is 
mainly caused by the difference between the two scales. 

The transformed parameters estimates were computed for the common scale (scale 1) by the 
proposed common scale transformation method. The transformed a- and ^-parameter estimates and 
the original c-parameter estimates were used to compute the EPC scores. It can be seen from Table 
1 that the EPC scores for the common items are very close for the two item pools. The small 
differences can be attributed to the errors involved in the process of parameter calibration and scale 
transformation. 

Example Two: Transformation for Three Item Pools 

The item parameter estimates of the 1 1 common items for Pool 1 used in Example One were 
used in this example as true parameters. These item parameters were linearly transformed to three 
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different scales representing scales for three independently calibrated item pools: Pools 1, 2 and 3. 
These parameters are presented in Table 2 under the label "On Original Scales". The EPC scores for 
the common items on the original scales are also presented in Table 2. Because these three scales 
were different, the EPC scores for the same common item are not the same for the three from scales. 

The common scale transformation was performed in the same way as in Example One. The 
transformed parameters and EPC scores are presented in Table 2 under the label "On Common 
Scale". Because the parameters on the three from scales were linear transformations of the true 
parameters, no parameter calibration errors were involved. After the common scale transformation, 
the parameters were placed on a common scale, the a and b parameters and the EPC scores for the 
common items in all three item pools became identical. 

Conclusion 

The proposed common scale transformation method provides an alternative approach for 
solving the problem of incompatible IRT parameter estimates calibrated with two or more 
independent samples of subjects. It is a useful method for finding a common scales for item 
parameters from multiple item pools calibrated independently. 
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Table 1. The Untransformed and Transformed Item Parameter Estimates and EPC Scores For 
Example One 



Item ID 


Pool ID 


On Original Scales 


On Common Scale 


a 


b 


EPC 


a 


b 


EPC 


1 


1 


0.455 


-0.710 


0.666 


0.455 


-0.710 


0.666 


1 


2 


0.442 


-1.335 


0.699 


0.384 


-1.341 


0.695 


2 


1 


0.584 


-0.857 


0.680 


0.584 


-0.857 


0.680 


2 


2 


0.573 


-1.321 


0.717 


0.498 


-1.325 


0.715 


3 


1 


0.754 


0.021 


0.578 


0.754 


0.021 


0.578 


3 


2 


0.599 


-0.710 


0.631 


0.521 


-0.623 


0.621 


4 


1 


0.663 


0.051 


0.557 


0.663 


0.051 


0.557 


4 


2 


0.604 


-0.354 


0.578 


0.525 


-0.214 


0.563 


5 


1 


1.069 


0.961 


0.569 


1.069 


0.961 


0.569 


5 


2 


0.990 


0.532 


0.607 


0.861 


0.805 


0.585 


6 


1 


0.967 


0.195 


0.505 


0.967 


0.195 


0.505 


6 


2 


0.808 


-0.116 


0.545 


0.703 


0.060 


0.526 


7 


1 


0.348 


2.277 


0.388 


0.348 


2.277 


0.388 


7 


2 


0.414 


2.554 


0.424 


0.360 


3.129 


0.398 


8 


1 


1.458 


1.024 


0.531 


1.458 


1.024 


0.531 


8 


2 


1.355 


0.581 


0.559 


1.179 


0.861 


0.533 


9 


1 


0.702 


2.240 


0.307 


0.702 


2.240 


0.307 


9 


2 


0.634 


1.896 


0.340 


0.551 


2.373 


0.299 


10 


1 


1.408 


1.556 


0.369 


1.408 


1.556 


0.369 


10 


2 


1.135 


1.079 


0.412 


0.987 


1.434 


0.373 


11 


1 


1.299 


2.159 


0.325 


1.299 


2.159 


0.325 


11 


2 


0.926 


2.134 


0.344 


0.805 


2.646 


0.297 
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Table 2. The Untransformed and Transformed Item Parameter and EPC Scores For Example Two 



Item ID 


Pool ID 


On Original Scales 


On Common Scale 


a 


b 


EPC 


a 


b 


EPC 


1 


1 


0.501 


-0.736 


0.670 


0.501 


-0.736 


0.670 


1 


2 


0.523 


-0.748 


0.671 


0.501 


-0.736 


0.670 


1 


3 


0.546 


-0.800 


0.676 


0.501 


-0.736 


0.670 


2 


1 


0.642 


-0.870 


0.682 


0.642 


-0.870 


0.682 


2 


2 • 


0.671 


-0.875 


0.683 


0.642 


-0.870 


0.682 


2 


3 


0.701 


-0.922 


0.688 


0.642 


-0.870 


0.682 


3 


1 


0.830 


-0.072 


0.587 


0.830 


-0.072 


0.587 


3 


2 


0.868 


-0.112 


0.591 


0.830 


-0.072 


0.587 


3 


3 


0.905 


-0.191 


0.599 


0.830 


-0.072 


0.587 


4 


1 


0.730 


-0.045 


0.567 


0.730 


-0.045 


0.567 


4 


2 


0.763 


-0.086 


0.571 


0.730 


-0.045 


0.567 


4 


3 


0.796 


-0.166 


0.579 


0.730 


-0.045 


0.567 


5 


1 


1.176 


0.783 


0.584 


1.176 


0.783 


0.584 


5 


2 


1.229 


0.705 


0.591 


1.176 


0.783 


0.584 


5 


3 


1.283 


0.593 


0.600 


1.176 


0.783 


0.584 


6 


1 


1.064 


0.086 


0.517 


1.064 


0.086 


0.517 


6 


2 


1.112 


0.039 


0.522 


1.064 


0.086 


0.517 


6 


3 


1.161 


-0.046 


0.532 


1.064 


0.086 


0.517 


7 


1 


0.383 


1.979 


0.404 


0.383 


1.979 


0.404 


7 


2 


0.400 


1.849 


0.412 


0.383 


1.979 


0.404 


7 


3 


0.417 


1.689 


0.424 


0.383 


1.979 


0.404 


8 


1 


1.604 


0.840 


0.547 


1.604 


0.840 


0.547 


8 


2 


1.677 


0.760 


0.555 


1.604 


0.840 


0.547 


8 


3 


1.750 


0.645 


0.565 


1.604 


0.840 


0.547 


9 


1 


0.772 


1.946 


0.335 


0.772 


1.946 


0.335 


9 


2 
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